Quantized Stochastic Gradient Descent: Communication versus Convergence

ثبت نشده

چکیده

Parallel implementations of stochastic gradient descent (SGD) have received signif1 icant research attention, thanks to excellent scalability properties of this algorithm, 2 and to its efficiency in the context of training deep neural networks. A fundamental 3 barrier for parallelizing large-scale SGD is the fact that the cost of communicat4 ing the gradient updates between nodes can be very large. Consequently, lossy 5 compression heuristics have been proposed, by which nodes only communicate 6 quantized gradients. Although effective in practice, these heuristics do not always 7 provably converge, and it is not clear whether they are optimal. In this paper, we 8 propose Quantized SGD (QSGD), a family of compression schemes which allow 9 the compression of gradient updates at each node, while guaranteeing convergence 10 under standard assumptions. QSGD allows the user to trade off compression and 11 convergence time: it can communicate a sublinear number of bits per iteration 12 in the model dimension, and can achieve asymptotically optimal communication 13 cost. We complement our theoretical results with empirical data, showing that 14 QSGD can significantly reduce communication cost, while being competitive with 15 standard uncompressed techniques on a variety of real tasks. 16

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanksto excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updatesbetween nodes can be ve...

متن کامل

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradient...

متن کامل

Preserving communication bandwidth with a gradient coding scheme

Large–scale machine learning involves the communicaiton of gradients, and large models often saturate the communication bandwidth to communicate gradients. I implement an existing scheme, quantized stochastic gradient descent (QSGD) to reduce the communication bandwidth. This requires a distributed architecture and we choose to implement a parameter server that uses the Message Passing Interfac...

متن کامل

Conditional Accelerated Lazy Stochastic Gradient Descent

In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).

متن کامل

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various scenarios, f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Quantized Stochastic Gradient Descent: Communication versus Convergence

ثبت نشده

چکیده

منابع مشابه

QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Preserving communication bandwidth with a gradient coding scheme

Conditional Accelerated Lazy Stochastic Gradient Descent

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

عنوان ژورنال:

اشتراک گذاری